Engineering the LOUDS Succinct Tree Representation
نویسندگان
چکیده
Ordinal trees are arbitrary rooted trees where the children of each node are ordered. We consider succinct, or highly space-efficient, representations of (static) ordinal trees with n nodes that use 2n+ o(n) bits of space to represent ordinal trees. There are a number of such representations: each supports a different set of tree operations in O(1) time on the RAM model. In this paper we focus on the practical performance the fundamental Level-Order Unary Degree Sequence (LOUDS) representation [Jacobson, Proc. 30th FOCS, 549–554, 1989]. Due to its conceptual simplicity, LOUDS would appear to be a representation with good practical performance. A tree can also be represented succinctly as a balanced parenthesis sequence [Munro and Raman, SIAM J. Comput. 31 (2001), 762–776; Jacobson, op. cit.; Geary et al. Proc. 15th CPM Symp., LNCS 3109, pp. 159–172, 2004]. In essence, the two representations are complementary, and have only the basic navigational operations in common (parent, first-child, last-child, prev-sibling, next-sibling). Unfortunately, a naive implementation of LOUDS is not competitive with the parenthesis implementation of Geary et al. on the common set of operations. We propose variants of LOUDS, of which one, called LOUDS++, is competitive with the parenthesis representation. A motivation is the succinct representation of large static XML documents, and our tests involve traversing XML documents in various canonical orders.
منابع مشابه
Succinct Trees in Practice
We implement and compare the major current techniques for representing general trees in succinct form. This is important because a general tree of n nodes is usually represented in pointer form, requiring O(n log n) bits, whereas the succinct representations we study require just 2n + o(n) bits and carry out many sophisticated operations in constant time. Yet, there is no exhaustive study in th...
متن کاملA Succinct N-gram Language Model
Efficient processing of tera-scale text data is an important research topic. This paper proposes lossless compression of N gram language models based on LOUDS, a succinct data structure. LOUDS succinctly represents a trie with M nodes as a 2M + 1 bit string. We compress it further for the N -gram language model structure. We also use ‘variable length coding’ and ‘block-wise compression’ to comp...
متن کاملCompression of double array structures for fixed length keywords
Trie is one of the data structures for keyword matching. The trie is used in natural language processing, IP address routing, and so on. It is represented by the matrix form, the link form, the double array, and LOUDS. The double array combines retrieval speed of the matrix form with compactness of the list form. LOUDS is a succinct data structure using bit-string. Retrieval speed of LOUDS is n...
متن کاملSuccinct representation of labeled trees
We give a representation for labeled ordered trees that supports labeled queries such as finding the i-th ancestor of a node with a given label. Our representation is succinct, namely the redundancy is small-o of the optimal space for storing the tree. This improves the representation of He et al. [ISAAC 2012] which is succinct only when the entropy of the labels is ω(1).
متن کاملUltra-succinct representation of ordered trees with applications
a r t i c l e i n f o a b s t r a c t There exist two well-known succinct representations of ordered trees: BP (balanced parenthesis) (Munro and Raman, 2001) [20] and DFUDS (depth first unary degree sequence) (Benoit et al., 2005) [1]. Both have size 2n + o(n) bits for n-node trees, which asymptotically matches the information-theoretic lower bound. Importantly, many fundamental operations on t...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2006